"Sampling consists of selecting some part of the population to observe so that one may estimate something about the whole population."

— Page 1, Sampling, Third Edition, 2012.

This is a really important part of dealing with statistics as there are very few occasions in the real world in which you truly have access to all the data you need. For a whole host of reasons,in statistics we are not always able to canvas every member of the population.

Examaple 1 (US Census Vs American Community Survey)

A perfect example of this, is the difference between the US Census and the American Community Survey. While over 90% of Americans are regurally counted in the Census the American Community Survey only canvases approximately 3.5 million Americans. The 2020 US Census Cost an astonishing $15.6 billion dollars to conduct and 10 years to plan. This makes it extremely difficult to do yearly. On the other hand the American Community Service can be conducted yearly due to it only costing less than 200 million a year to produce.

These graphs show prices for Censuses have quickly been increasing. Considering there are nine pretty simple questions on the Census, this further illustrates that if we want an affordable way to know what people on a countrywide level know

Example 2 (Random Selections From Common Distribution)

The Great thing about Sampling is that if you have a large enough sample that is randomnized, you can get a very good estimate of the true value. But as they are estimates we use different symbols:

it is important to use the different symbols for samples as they show that they are estimates and estimates are not known to perfectly represent the whole population.
sample statistic population parameter description
n N number of members of sample or population
x̅ “x-bar” μ “mu” or μx mean
M or Med or x̃ “x-tilde” (none) median
s (TIs say Sx) σ “sigma” or σx standard deviation For variance, apply a squared symbol (s² or σ²).
r ρ “rho” coefficient of linear correlation
p̂ “p-hat” p proportion
z t χ² (n/a) calculated test statistic

A Normal Distribution commonally comes up ins statistics if we take just 500 points from the normal distrubution, we already get something that looks much like a normal distribution:

and Guess what it also works for the Uniform Distribution:

and Bernoulli Distribution:

and Beta Distribution:

and even the Log Normal Distrubution:

Sampling is insanely useful yet it is important to remember both the pros and cons!!

Pros:

  1. Lower Cost: This is both in Time and Money as shown by our first graph.

  2. Some Sampling is destructive: If we wanted to test the breaking point of steel, if we tested every piece a factory-made to breaking point, we would have none to sell.

  3. Convienience: Similar to point 1, being able to get data quickly allows you to make decisions quicker.

Cons:

  1. Chances of bias: This may happen if the selection method doesn't result in everyone being selected with the same chance.

  2. It's hard to select a truly representative sample, This is especially true for extremely complex non-linear situations. A perfect example is in polling, where the people who don't respond to polls qualitatively different than those who don't. There is not a simple way to represent them.

  3. Smalll numbers: If the population itself is very small, then any subpopulation will be very unrepresentative. Each individual affects the group as a whole so much.